feat: QUIC agent tunnel — protocol, listener, agent client#1738
feat: QUIC agent tunnel — protocol, listener, agent client#1738irvingouj@Devolutions (irvingoujAtDevolution) wants to merge 14 commits intomasterfrom
Conversation
1ab42a3 to
38e79d3
Compare
QUIC Agent Tunnel — Technical Specification1. EnrollmentHow an agent gets its certificateKey property: the private key never leaves the agent machine. Enrollment tokenThe enrollment token is either:
2. Stream MultiplexingOne QUIC connection, many independent streamsEach stream is independently ordered. How a new session is established
No new QUIC handshake is needed — streams are opened instantly on the existing connection. Message encodingAll control and session setup messages use length-prefixed bincode: After Size limits
Limits are enforced on the length prefix (before reading the payload) and on the bincode deserializer (prevents crafted payloads with huge internal Vec lengths). 3. User ExperienceNetwork topologyAdmin setup (one-time)
End-user workflow (daily use)The user has no awareness of the agent. From their perspective:
What happens behind the scenes: No VPN. No inbound firewall rules on the office network. No routing configuration. Transparent routing rulesWhen a connection request arrives, the gateway evaluates routing in priority order:
When multiple agents match the same target, the most recently seen agent is tried first. Resilience
|
1 similar comment
QUIC Agent Tunnel — Technical Specification1. EnrollmentHow an agent gets its certificateKey property: the private key never leaves the agent machine. Enrollment tokenThe enrollment token is either:
2. Stream MultiplexingOne QUIC connection, many independent streamsEach stream is independently ordered. How a new session is established
No new QUIC handshake is needed — streams are opened instantly on the existing connection. Message encodingAll control and session setup messages use length-prefixed bincode: After Size limits
Limits are enforced on the length prefix (before reading the payload) and on the bincode deserializer (prevents crafted payloads with huge internal Vec lengths). 3. User ExperienceNetwork topologyAdmin setup (one-time)
End-user workflow (daily use)The user has no awareness of the agent. From their perspective:
What happens behind the scenes: No VPN. No inbound firewall rules on the office network. No routing configuration. Transparent routing rulesWhen a connection request arrives, the gateway evaluates routing in priority order:
When multiple agents match the same target, the most recently seen agent is tried first. Resilience
|
There was a problem hiding this comment.
Pull request overview
Adds the first slice of a QUIC/mTLS “agent tunnel” system: a shared binary protocol crate, a Gateway-side QUIC listener/registry/enrollment API, and an Agent-side enrollment + reconnecting tunnel client. This enables routing Gateway-initiated TCP proxy sessions through outbound-connected agents (for private-network reachability).
Changes:
- Introduces
agent-tunnel-protocrate (control/session messages, framing, protocol versioning). - Adds Gateway agent-tunnel core (
agent_tunnelmodule), config wiring, REST endpoints, and token claim support (jet_agent_id) used in the forwarding path. - Adds Agent enrollment/bootstrap + QUIC tunnel client with auto-reconnect and domain auto-detection.
Reviewed changes
Copilot reviewed 35 out of 36 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| devolutions-gateway/tests/config.rs | Updates config samples to include agent_tunnel field. |
| devolutions-gateway/src/token.rs | Adds jet_agent_id to association claims; adjusts scope token claims serialization/visibility. |
| devolutions-gateway/src/service.rs | Initializes and registers the agent-tunnel listener task when enabled. |
| devolutions-gateway/src/ngrok.rs | Threads agent_tunnel_handle into the TCP tunnel client path. |
| devolutions-gateway/src/middleware/auth.rs | Adds auth exception for /jet/agent-tunnel/enroll (self-auth via bearer token). |
| devolutions-gateway/src/listener.rs | Threads agent_tunnel_handle into the generic client path. |
| devolutions-gateway/src/lib.rs | Exposes agent_tunnel module and adds agent_tunnel_handle to DgwState. |
| devolutions-gateway/src/generic_client.rs | Uses jet_agent_id to route Fwd connections through the agent tunnel. |
| devolutions-gateway/src/extract.rs | Adds request extractors for agent-management read/write access control. |
| devolutions-gateway/src/config.rs | Adds AgentTunnelConf to Gateway config DTO and runtime config. |
| devolutions-gateway/src/api/webapp.rs | Ensures new jet_agent_id claim is present (set to None) when minting tokens. |
| devolutions-gateway/src/api/mod.rs | Nests the new /jet/agent-tunnel/* router. |
| devolutions-gateway/src/api/agent_enrollment.rs | Implements enrollment + agent management endpoints (list/get/delete/resolve-target). |
| devolutions-gateway/src/agent_tunnel/mod.rs | Declares agent-tunnel submodules and re-exports core types. |
| devolutions-gateway/src/agent_tunnel/listener.rs | QUIC UDP listener event loop + proxy-stream request dispatching. |
| devolutions-gateway/src/agent_tunnel/enrollment_store.rs | In-memory single-use enrollment token store with expiry. |
| devolutions-gateway/src/agent_tunnel/stream.rs | Tokio AsyncRead/AsyncWrite wrapper over QUIC streams via channels. |
| devolutions-gateway/src/agent_tunnel/registry.rs | Agent registry with heartbeat liveness + subnet/domain routing selection. |
| devolutions-gateway/src/agent_tunnel/connection.rs | Managed quiche connection: handshake identity, control parsing, proxy stream setup. |
| devolutions-gateway/src/agent_tunnel/cert.rs | CA manager for enrollment signing + server cert issuance and cert parsing helpers. |
| devolutions-gateway/Cargo.toml | Adds QUIC/proto/cert/routing dependencies for the tunnel feature. |
| devolutions-agent/src/service.rs | Registers TunnelTask when tunnel is enabled; fixes conf_handle cloning for RDP task. |
| devolutions-agent/src/main.rs | Adds CLI support for enroll/up bootstrap flows and parsing helpers + tests. |
| devolutions-agent/src/lib.rs | Exposes new modules: tunnel, enrollment, domain_detect. |
| devolutions-agent/src/enrollment.rs | Implements enrollment request + persistence of certs/config merge. |
| devolutions-agent/src/domain_detect.rs | Adds Windows/Linux DNS domain auto-detection helper. |
| devolutions-agent/src/tunnel.rs | Implements reconnecting QUIC client + control/session stream handling and TCP proxying. |
| devolutions-agent/src/config.rs | Adds tunnel config section; makes save_config/get_conf_file_path public. |
| devolutions-agent/Cargo.toml | Adds proto/quiche/reqwest/rcgen dependencies and Windows feature for domain detection. |
| crates/agent-tunnel-proto/src/lib.rs | Defines the protocol crate API surface and exports. |
| crates/agent-tunnel-proto/src/version.rs | Adds protocol version constants + validation helper. |
| crates/agent-tunnel-proto/src/error.rs | Defines protocol-level error types. |
| crates/agent-tunnel-proto/src/control.rs | Adds control-plane message definitions + framed encode/decode. |
| crates/agent-tunnel-proto/src/session.rs | Adds session-plane message definitions + framed encode/decode. |
| crates/agent-tunnel-proto/Cargo.toml | New crate manifest and dependencies. |
| Cargo.lock | Locks new dependencies introduced for QUIC, cert handling, registry, and protocol crate. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
a9f13e5 to
884be54
Compare
884be54 to
a61aef3
Compare
f1e6317 to
744bf2f
Compare
Add QUIC-based agent tunnel core infrastructure. Agents in private
networks connect outbound to Gateway via QUIC/mTLS, advertise reachable
subnets and domains, and proxy TCP connections on behalf of Gateway.
Protocol (agent-tunnel-proto crate):
- RouteAdvertise with subnets + domain advertisements
- ConnectMessage/ConnectResponse for session stream setup
- Heartbeat/HeartbeatAck for liveness detection
- Protocol version negotiation (v2)
Gateway (agent_tunnel module):
- QUIC listener with mTLS authentication
- Agent registry with subnet/domain tracking
- Certificate authority for agent enrollment
- Enrollment token store (one-time tokens)
- Bidirectional proxy stream multiplexing
Agent (devolutions-agent):
- QUIC client with auto-reconnect and exponential backoff
- Agent enrollment with config merge (preserves existing settings)
- Domain auto-detection (Windows: USERDNSDOMAIN, Linux: resolv.conf)
- Subnet validation on incoming connections
- Certificate file permissions (0o600 on Unix)
API endpoints:
- POST /jet/agent-tunnel/enroll — agent enrollment
- GET /jet/agent-tunnel/agents — list agents
- GET /jet/agent-tunnel/agents/{id} — get agent
- DELETE /jet/agent-tunnel/agents/{id} — delete agent
- POST /jet/agent-tunnel/agents/resolve-target — routing diagnostics
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ConnectMessage → ConnectRequest (precise naming) - Move encode/decode into ControlStream/SessionStream wrappers (actor-on-object: ctrl.send(&msg) instead of msg.encode(&mut stream)) - ControlStream.into_split() → ControlSendStream + ControlRecvStream (compile-time separation, no phantom halves) - From<(S, R)> for stream wrappers (connection.open_bi().await?.into()) - Rename spawned tasks: run_control_reader, run_session_proxy, run_agent_connection, run_control_loop - Spawned tasks own args and handle errors internally - Collect JoinHandles, abort all on shutdown - Extract helpers to tunnel_helpers.rs - Document backoff strategy with examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- CaManager::load_or_generate returns Arc<Self> directly - Rename enrollment token consume → redeem - Remove unused resolve-target API endpoint + helpers + tests - Remove routing methods from registry (PR2 scope) - Remove Option from RouteAdvertisementState (empty = no routes) - Target enum for typed IP vs domain parsing - Prefix variables clearly (server_cert_*, ca_*) - Add TODO for traffic audit and Windows DACL - Backoff strategy documented with examples Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address review feedback from Benoit and Marc-André: - Rename HTTP mountpoint /jet/agent-tunnel → /jet/tunnel - Replace SkipHostnameVerification with SpkiPinnedVerifier that performs full chain + hostname + SPKI pin validation - Enrollment response now includes server_spki_sha256 for pinning - Agent sends machine hostname; gateway adds it as DNS SAN alongside the UUID SAN (dual names for future direct connectivity) - Agent connects using real gateway hostname instead of dummy value - Move sha2/hex to cross-platform deps, add x509-parser + hostname Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove agent_name from EnrollResponse (agent knows it already) - Agent generates its own UUID and sends it in EnrollRequest - Rename api/agent_enrollment.rs → api/tunnel.rs (match endpoint) - Use backoff crate for reconnect loop (same pattern as subscriber.rs) - ALPN: "devolutions-agent-tunnel" → "gw-agent-tunnel/1" (versioned) - Protocol version: 2 → 1 (previous was experimental, start fresh) - Move session tests to integration test file (public API only) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- SanType::Rfc822Name → SanType::URI for urn:uuid: (correct X.509 type) - GeneralName::RFC822Name → GeneralName::URI in extraction - Reject duplicate agent UUID on enrollment (409 Conflict) - tokio::join! instead of select! for session proxy (prevents data loss) - JoinSet instead of Vec<JoinHandle> (prevents unbounded growth) - Timeout (30s) on session handshake recv_request/recv_response - Fix typos: "redeemd" → "redeemed" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move current_time_millis() to agent-tunnel-proto (R1: eliminate duplication) - Delete DomainInfo, use DomainAdvertisement directly in AgentInfo (R2) - Merge enroll_agent/bootstrap_and_persist into single function (I1) - Agent task_handles: Vec<JoinHandle> → JoinSet with reaping (I4) - Same-epoch route refresh: mutate updated_at in place, no clone (I5) - Add #[must_use] on enrollment_store::redeem() (I6) - connect_via_agent: cleaner error extraction with if-let (I3) - Add TODO for active_stream_count tracking (I2) - SECS_PER_DAY constant replaces magic 86400 (P4) - Consistent .context() for ProtoError instead of map_err (P7) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8638365 to
ad3d3a0
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 38 out of 39 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Hoist protocol version validation before match in both gateway and agent control loops (single check, no per-variant boilerplate) - Validate ConnectResponse protocol version in connect_via_agent - ServerCertStatus enum for ensure_server_cert (expiry + hostname SAN) - send.finish() after proxy copy (graceful QUIC EOF) - Fix constant_time_eq doc (inaccurate timing claim) - Extract ALPN to agent_tunnel_proto::ALPN_PROTOCOL constant - Destruct EnrollResponse at parameter level for readability - ValidatedTunnelConf: make wrong state unrepresentable at type level (dto::TunnelConf for JSON, TunnelConf for runtime with non-optional fields) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| fn parse_enrollment_string(value: &str) -> Result<EnrollmentStringPayload> { | ||
| const PREFIX: &str = "dgw-enroll:v1:"; | ||
|
|
||
| let encoded = value.strip_prefix(PREFIX).context("invalid enrollment string prefix")?; | ||
|
|
||
| let decoded = base64::engine::general_purpose::URL_SAFE_NO_PAD | ||
| .decode(encoded) | ||
| .context("invalid base64 enrollment string")?; | ||
|
|
||
| let payload: EnrollmentStringPayload = | ||
| serde_json::from_slice(&decoded).context("invalid enrollment string payload")?; | ||
|
|
||
| if payload.version != 1 { | ||
| bail!("unsupported enrollment string version: {}", payload.version); | ||
| } | ||
|
|
||
| Ok(payload) | ||
| } |
There was a problem hiding this comment.
suggestion: Switch to JWT instead of a custom format
There was a problem hiding this comment.
suggestion: Use the same approach as jmux-proto. Do not use serde and bincode.
| base64 = "0.22" | ||
| bincode = "1.3" | ||
| ipnetwork = "0.20" | ||
| dashmap = "6.1" |
There was a problem hiding this comment.
question: Do we really need dashmap?
There was a problem hiding this comment.
suggestion: I see a lot of new dependencies. Maybe reevaluate the dependencies what is absolutely necessary and what could be removed. I see pull multiple libraries to parse PEM files… Pretty sure we already had something before pem and rustls-pem.
There was a problem hiding this comment.
suggestion: Extract more logic into a separate crate, the same way we did for the network scanner. agent-tunnel-proto (already existing) + agent-tunnel.
- `dashmap::DashMap` → `tokio::sync::RwLock<HashMap>` in
`enrollment_store`, `listener`, `registry`. All lookups/inserts await
the lock; values are cloned out (Arc/quinn::Connection) so no guards
escape the critical section.
- `pem` crate → `rustls_pemfile::certs` via a small `cert_pem_to_der`
helper in `cert.rs`. The CSR tamper test now uses `base64` directly
for PEM encode/decode.
- `bincode` + `serde` → hand-rolled binary encoding in
`agent-tunnel-proto`, following the `jmux-proto` pattern:
* `FramedSend<S>` / `FramedRecv<R>` handle length-prefixed framing
and encode/decode via private `Encode` / `Decode` traits.
* `ControlStream` / `SessionStream` compose `FramedSend` +
`FramedRecv` with their respective max frame sizes; no more free
`write_framed` / `read_framed` helpers.
* Each `ControlMessage` / `ConnectRequest` / `ConnectResponse`
variant has an explicit wire layout with tag bytes, big-endian
integers, u32-length-prefixed strings, and explicit IPv4 framing.
* `serde` becomes an optional feature on the proto crate, enabled by
`devolutions-gateway` for its JSON API (`DomainAdvertisement`
serialization); `devolutions-agent` drops it entirely.
All 18 proto tests (roundtrip + proptest) pass unchanged.
Addresses Benoit's review comment: "Switch to JWT instead of a custom format". The old `dgw-enroll:v1:<base64-JSON>` envelope is replaced with a standard JWT that carries the same information via JWT claims and doubles as the Bearer token for `/jet/tunnel/enroll`. Gateway: - Add `AccessScope::TunnelEnroll` and a dedicated `EnrollmentTokenClaims` struct with `scope`, `exp`, `jti`, `jet_gw_url` (required) and `jet_agent_name` (optional). The `jet_*` prefix matches the existing convention for gateway-specific custom claims (`jet_aid`, `jet_ap`, `jet_gw_id`, ...). - Add `validate_enrollment_jwt` in `api/tunnel.rs` (ported from feature branch). Verifies signature against `provisioner_public_key`, checks `exp`/`nbf` via picky's strict validator, and enforces scope is `TunnelEnroll` or `Wildcard`. - `enroll_agent` now tries JWT first, then the one-time token store, then the static `enrollment_secret` as a fallback. - 7 unit tests cover the happy path, wildcard scope, wrong scope, expiry, signature mismatch, missing required claim, and malformed input. Agent: - Replace `EnrollmentStringPayload` / `parse_enrollment_string` with `EnrollmentJwtClaims` / `parse_enrollment_jwt`. The parser splits on `.` and decodes the payload segment without verifying the signature (agent is the intended recipient; the Gateway verifies on enrollment). - The JWT string itself becomes the Bearer token — no more separate `enrollment_token` field nested inside a custom envelope. - 3 tests: happy path via `parse_up_command_args`, malformed rejection, and missing-`jet_gw_url` rejection. Also fixes the pre-existing inline registry tests that broke in the previous commit when `DashMap` → `tokio::sync::RwLock<HashMap>` made `AgentRegistry` methods async and `DomainAdvertisement.domain` became a `DomainName` newtype.
Addresses Benoit's review comment: "Extract more logic into a separate crate, the same way we did for the network scanner. `agent-tunnel-proto` (already existing) + `agent-tunnel`." The agent tunnel module was already self-contained (zero `use crate::*` imports), so the extraction is a mechanical move: - Create `crates/agent-tunnel/` as a new workspace crate - Move `cert.rs`, `enrollment_store.rs`, `listener.rs`, `registry.rs`, `stream.rs` from `devolutions-gateway/src/agent_tunnel/` (git tracks these as renames) - New `lib.rs` does the `#[macro_use] extern crate tracing` dance and re-exports the public surface (`AgentTunnelHandle`, `AgentTunnelListener`, `AgentRegistry`, `EnrollmentTokenStore`, `TunnelStream`) - Delete `devolutions-gateway/src/agent_tunnel/mod.rs` - Gateway now depends on `agent-tunnel` as a path dependency; call sites change `crate::agent_tunnel::*` → `agent_tunnel::*` Also promote `Encode` / `Decode` in `agent-tunnel-proto::codec` from `pub(crate)` to `pub` so `FramedSend::send` / `FramedRecv::recv` (which bound on them) are reachable in the new crate without `private_bounds` warnings. Tests: 20 moved from gateway inline into the new crate and all still pass; gateway still has 64 lib tests + all integration tests green; agent + proto tests untouched.
Review-agent findings addressed: - Drop `ControlMessage`/`ConnectRequest`/`ConnectResponse` inherent `encode`/`decode` methods. They duplicated the `Encode`/`Decode` trait impls with identical signatures, so callsites and rustdoc saw two methods for one job. Only the trait impls remain; stream wrappers already go through the traits. - `RouteAdvertisementState::update_routes` same-epoch branch now logs the *incoming* subnet/domain counts (previously re-logged the existing state's count, which read as if we had accepted the new set) and makes it explicit in the message that incoming routes are ignored. - Rename `constant_time_eq` → `timing_safe_eq`. The function hashes inputs first and only the 32-byte digest compare is constant-time. New name describes intent; doc comment now explains both what the hash normalization buys and what the function does *not* guarantee. - Document that `EnrollmentTokenStore::redeem` removes expired tokens as a side effect (so callers cannot distinguish "missing" from "expired", and shouldn't). - Explain in `parse_enrollment_jwt` why we handroll the split/decode instead of pulling `picky` into the agent for unverified payload reading. - Move `use agent_tunnel_proto::current_time_millis;` to the top of `registry.rs` with the other imports (was dangling at module bottom after the IPv4-only revert). - Apply `cargo fmt`. Tests: 20 agent-tunnel + 13 agent-tunnel-proto + 5 session_roundtrip + 64 gateway lib + 5 devolutions-agent, all green. Zero clippy warnings on the changed crates.
- Drop the 1-byte IP family tag from each subnet on the wire. The type
is `Ipv4Network` so the tag could only ever be `0x04`. Encoding it
was a TODO-by-bytes that would have constrained a future v2 without
helping v1. Each subnet is now `[4B ipv4_octets][1B prefix]` — saves
a byte per subnet per RouteAdvertise. If IPv6 arrives, the wire bump
comes with a `protocol_version` bump and the format can reintroduce
a tag cleanly.
- Add six unit tests for `DomainName::matches_hostname` covering exact
match, case insensitivity, suffix match, rejected partial-label
("fakecontoso.local" vs "contoso.local"), unrelated hostname, and
parent vs child domain. The method is only called from PR2's routing
code; these tests make sure the algorithm is locked down on PR1 so
the PR2 consumer can rely on it.
- `devolutions-agent/src/tunnel.rs`: replace the `continue;` on
backoff exhaustion with a fall-through using a 1s floor. Previously,
if `backoff.next_backoff()` ever returned `None` (supposedly
unreachable with `max_elapsed_time(None)`), the loop would spin
without any sleep. Defensive fix, not a correctness one.
All 20 agent-tunnel / 19 agent-tunnel-proto / 64 gateway-lib / 5
session_roundtrip / 5 devolutions-agent tests still pass. Zero clippy
warnings on the changed crates.
Summary
QUIC-based agent tunnel (PR 1 of 4). Agents in private networks connect outbound to Gateway via QUIC/mTLS, advertise reachable subnets and domains, and proxy TCP connections. Pure Rust (Quinn + rustls), zero C dependencies.
See Technical Spec for protocol details.
PR stack
Highlights
🤖 Generated with Claude Code